Do unbalanced data have a negative effect on LDA?
نویسندگان
چکیده
For two-class discrimination, Ref. [1] claimed that, when covariance matrices of the two classes were unequal, a (class) unbalanced dataset had a negative effect on the performance of linear discriminant analysis (LDA). Through re-balancing 10 realworld datasets, Ref. [1] provided empirical evidence to support the claim using AUC (Area Under the receiver operating characteristic Curve) as the performance metric. We suggest that such a claim is vague if not misleading, there is no solid theoretical analysis presented in [1], and AUC can lead to a quite different conclusion from that led to by misclassification error rate (ER) on the discrimination performance of LDA for unbalanced datasets. Our empirical and simulation studies suggest that, for LDA, the increase of the median of AUC (and thus the improvement of performance of LDA) from re-balancing is relatively small, while, in contrast, the increase of the median of ER (and thus the decline in performance of LDA) from re-balancing is relatively large. Therefore, from our study, there is no reliable empirical evidence to support the claim that a (class) unbalanced data set has a negative effect on the performance of LDA. In addition, re-balancing affects the performance of LDA for datasets with either equal or unequal covariance matrices, indicating that having unequal covariance matrices is not a key reason for the difference in performance between original and re-balanced data.
منابع مشابه
The Effect of Property Rights on Entrepreneurship:Evidence from Some Factor-driven, Efficiency-driven, and Innovation-driven Countries
Entrepreneurship is influenced by many factors and environments such as institutions. Institutions have an important role to play in the individual's tendency toward necessity and opportunity entrepreneurship. The purpose of this paper was to examine the impact of institutional quality (property rights) on opportunity and necessity entrepreneurship. The results, based on unbalanced panel data f...
متن کاملLoan Portfolio Diversification, Market Structure and Financial Stability of Banks
The purpose of this study is to investigate the effect of bank loan portfolio diversification and market structure on the financial stability of banks in the countrychr('39')s capital market. In order to achieve the above goal, the financial data of 17 banks have been used as unbalanced panels in the period from 2005 to 2018. In this study, data analysis was performed using fixed effects model...
متن کاملThe effect of imbalanced data sets on LDA: A theoretical and empirical analysis
This paper demonstrates that the imbalanced data sets have a negative effect on the performance of LDA theoretically. This theoretical analysis is confirmed by the experimental results: using several sampling methods to rebalance the imbalanced data sets, it is found that the performances of LDA on balanced data sets are superior to those of LDA on imbalanced data sets. 2006 Pattern Recognition...
متن کاملتأثیر توسعه مالی بر فقر در کشورهای اسلامی گروه دیهشت
Poverty is an issue that has been for many years and despite the world economic growth and the implementation of various policies against to poverty, it is also surround many people's lives and its decrease has become into one of the most important objectives of economic policymakers world. Financial sector for links with other different sectors of society plays important role and it can change...
متن کاملLDA Experimental Data of Three-Poster Jet Impingement System
During its near-ground hovering phase a Short Take-Off and Vertical Landing (STOVL) aircraft creates a complex three-dimensional flow field between jet streams, the airframe surface and the ground. A proper understanding and numerical prediction of this flow is important in the design of such aircraft. In this paper an experimental facility, used to gather validation data suitable for testing C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Pattern Recognition
دوره 41 شماره
صفحات -
تاریخ انتشار 2008